πŸ•ΈοΈ Ada Research Browser

tasks.md
← Back

Tasks: HPC-Specific CUI Compliance Roles

Input: Design documents from /specs/004-hpc-cui-roles/ Prerequisites: plan.md, spec.md, research.md, data-model.md, contracts/

Organization: Tasks are grouped by user story to enable independent implementation and testing of each story.

Format: [ID] [P?] [Story] Description

Path Conventions

This project uses Ansible collection structure: - Roles: roles/{role_name}/tasks/, roles/{role_name}/templates/, roles/{role_name}/files/ - Playbooks: playbooks/ - Tests: tests/molecule/{role_name}/, tests/integration/ - Docs: docs/


Phase 1: Setup (Shared Infrastructure)

Purpose: Role scaffolding and shared infrastructure


Phase 2: Foundational (Blocking Prerequisites)

Purpose: Core infrastructure that MUST be complete before ANY user story can be implemented

⚠️ CRITICAL: No user story work can begin until this phase is complete

Checkpoint: Foundation ready - user story implementation can now begin in parallel


Phase 3: User Story 1 - Slurm CUI Partition Operations (Priority: P1) 🎯 MVP

Goal: Enable secure CUI job submission with authorization verification, memory sanitization, and audit logging

Independent Test: Submit jobs to CUI partition with authorized/unauthorized users, verify prolog blocks unauthorized access, epilog clears memory, and audit logs capture all events

Implementation for User Story 1

Checkpoint: At this point, User Story 1 should be fully functional - researchers can submit authorized jobs to CUI partition with memory sanitization


Phase 4: User Story 2 - Container Security in CUI Enclave (Priority: P1)

Goal: Enable secure container execution with signed images, network isolation, and execution logging

Independent Test: Attempt to run signed/unsigned containers, access restricted paths, attempt network connections, verify restrictions work

Implementation for User Story 2

Checkpoint: At this point, User Story 2 should be fully functional - researchers can run signed containers with proper isolation


Phase 5: User Story 3 - Parallel Filesystem Security (Priority: P1)

Goal: Enable secure CUI project storage with ACL management, changelog monitoring, quotas, and sanitization

Independent Test: Create project directories, verify ACLs match FreeIPA groups, test changelog events, quota enforcement, and sanitization

Implementation for User Story 3

Checkpoint: At this point, User Story 3 should be fully functional - storage is secured with ACLs, monitoring, and sanitization


Phase 6: User Story 4 - Node Lifecycle Management (Priority: P2)

Goal: Enable automated node provisioning, compliance scanning, health checks, and secure decommissioning

Independent Test: PXE boot new node, verify compliance scan passes, run health checks between jobs, execute decommissioning

Implementation for User Story 4

Checkpoint: At this point, User Story 4 should be fully functional - nodes have automated lifecycle management


Phase 7: User Story 5 - Researcher Onboarding/Offboarding (Priority: P2)

Goal: Enable automated CUI project onboarding and offboarding with proper access provisioning and revocation

Independent Test: Run onboarding for test project, verify all resources created, run offboarding, verify complete cleanup

Implementation for User Story 5

Checkpoint: At this point, User Story 5 should be fully functional - projects can be onboarded and offboarded automatically


Phase 8: User Story 6 - Interconnect Security Documentation (Priority: P3)

Goal: Generate formal InfiniBand RDMA exception documentation with compensating controls verification

Independent Test: Generate exception documentation, verify compensating controls documented, validate template produces audit-ready artifacts

Implementation for User Story 6

Checkpoint: At this point, User Story 6 should be fully functional - interconnect exception documented for auditors


Phase 9: Polish & Cross-Cutting Concerns

Purpose: Documentation updates, integration, and testing infrastructure


Dependencies & Execution Order

Phase Dependencies

User Story Dependencies

Within Each User Story

Parallel Opportunities


Parallel Example: Phase 1 Setup

# Launch all role directory creation together:
Task: "Create role directory structure for roles/hpc_slurm_cui/"
Task: "Create role directory structure for roles/hpc_container_security/"
Task: "Create role directory structure for roles/hpc_storage_security/"
Task: "Create role directory structure for roles/hpc_interconnect/"
Task: "Create role directory structure for roles/hpc_node_lifecycle/"

Parallel Example: User Story 3 Templates

# Launch filesystem-specific templates together:
Task: "Create roles/hpc_storage_security/templates/lustre_changelog.conf.j2"
Task: "Create roles/hpc_storage_security/templates/beegfs_changelog.conf.j2"

Implementation Strategy

MVP First (User Stories 1-3 Only)

  1. Complete Phase 1: Setup (role scaffolding)
  2. Complete Phase 2: Foundational (meta, defaults, control mappings)
  3. Complete Phase 3: User Story 1 - Slurm CUI Partition
  4. STOP and VALIDATE: Test job submission with prolog/epilog
  5. Complete Phase 4: User Story 2 - Container Security
  6. STOP and VALIDATE: Test container restrictions
  7. Complete Phase 5: User Story 3 - Storage Security
  8. STOP and VALIDATE: Test ACLs and sanitization
  9. At this point, core HPC compliance is operational

Incremental Delivery

  1. Setup + Foundational β†’ Foundation ready
  2. Add US1 (Slurm) β†’ Test β†’ CUI jobs work
  3. Add US2 (Containers) β†’ Test β†’ Containers work
  4. Add US3 (Storage) β†’ Test β†’ Storage secured (MVP complete!)
  5. Add US4 (Node Lifecycle) β†’ Test β†’ Nodes automated
  6. Add US5 (Onboarding) β†’ Test β†’ Projects automated
  7. Add US6 (Interconnect) β†’ Test β†’ Audit-ready
  8. Polish β†’ Full validation

Parallel Team Strategy

With multiple developers:

  1. Team completes Setup + Foundational together
  2. Once Foundational is done:
  3. Developer A: User Story 1 (Slurm)
  4. Developer B: User Story 2 (Containers)
  5. Developer C: User Story 3 (Storage)
  6. After P1 stories complete:
  7. Developer A: User Story 4 (Node Lifecycle)
  8. Developer B: User Story 5 (Onboarding)
  9. Developer C: User Story 6 (Interconnect) + Polish

Notes